India is the world's third-largest producer and third largest consumer of electricity.
As Government policy makers struggle to balance the needs for a power hungry country and maintaining a sustainable report card, it becomes more important than ever to study the power production data of the country.
This is a gentle look into the data to realize some key insights.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
filename = "file_02.csv"
data = pd.read_csv(filename, thousands=',')
numeric = ["Thermal Generation Actual (in MU)","Thermal Generation Estimated (in MU)",'Nuclear Generation Actual (in MU)',\
'Nuclear Generation Estimated (in MU)','Hydro Generation Actual (in MU)','Hydro Generation Estimated (in MU)']
sectors = ["Thermal","Nuclear","Hydro"]
data = data.fillna(0)
date = data
date['Date']= pd.to_datetime(date['Date'])
date["Year"] = pd.DatetimeIndex(date['Date']).year
date["Month"] = pd.DatetimeIndex(date['Date']).month
date = data.groupby(["Date"])[numeric].sum()
date.to_csv("Daily Total Production.csv")
Our Country's power production has historically been dependant on the fossil fuel sector and the dependance still shows as over 75% of our needs are still met by the non renewable energy sources in this country.
As global warming and climate change looms large and our incessant power habits has started affecting real life, it becomes really important that our footprint remain sustainable.
Add to this the complicated diplomacies in securing fuel from outside the country and the pollution risks of manufacturing within the country.
It is hence important to track the progress of our renewable energy investments and to study trends relating to those to help further our cause.
for i in range(3):
print(sectors[i]+" Power Plant Total Daily Generation in the Country.")
plt.figure(figsize=(20,8),dpi = 500)
plt.plot(date[numeric[0+2*i]], label= numeric[0+2*i], linewidth=1)
plt.plot(date[numeric[1+2*i]], label= numeric[1+2*i],linewidth=1)
plt.title(sectors[i]+" Power Plant Total Daily Generation in the Country.")
plt.xlabel("Date")
plt.ylabel("Generation")
plt.legend()
plt.show()
plt.savefig("Images/"+sectors[i]+" Power Plant Total Daily Generation in the Country.png",transparent=True,dpi = 250)
India's significant energy production is of the order:
1. Thermal
2. Hydroelectric
3. Nuclear
We try to see if Regions have unique Energy Production prints through which they can be identified. This is a better way to hone into the regionwise implications and power situations.
In this section, various models and their accuracy metrics are compared.
data.head()
from sklearn.preprocessing import LabelEncoder,StandardScaler
from sklearn.model_selection import train_test_split
label_encoder = LabelEncoder()
data['Region Number'] = label_encoder.fit_transform(data['Region'])
y = data['Region Number'].copy()
X = data.drop(['Region Number','Region','Date','index'], axis=1).copy()
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, stratify=y)
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, GradientBoostingClassifier, RandomForestClassifier
models = [
LogisticRegression(),
SVC(),
MLPClassifier(),
DecisionTreeClassifier(),
AdaBoostClassifier(),
BaggingClassifier(),
GradientBoostingClassifier(),
RandomForestClassifier()
]
model_names = [
" Logistic Regression",
" Support Vector Machine",
" Neural Network",
" Decision Tree",
" AdaBoost Classifier",
" Bagging Classifier",
"Gradient Boosting Classifier",
" Random Forest Classifier"
]
results = []
for model in models:
model.fit(X_train, y_train)
results.append(model.score(X_test, y_test))
scores = pd.DataFrame(model_names,columns=['Model'])
scores["R2"] = results
scores.to_csv('Spreadsheets/Classification Scores.csv',index=False)
from IPython.display import HTML
HTML(scores.to_html(index=False))
The high R squared scores imply there being huge correlations between the power outputs and region. More importantly, the lack of confusion due to similarity can imply that a common policy for all approach can get limiting. Decision Tree Classifiers, Gradient Boosting Classifiers and the humble Logistic Regression models have been most impressive.
Given a Region, investigate the likely confidence interval for the actual data using the current existent data. This is to find out the likely interval of Regionwise power production.
This data is hugely helpful while making decisions including a risk of power sufficiency.
This can also help scientists and engineers focus better on the high risk scenarios showing poor outputs.
region_name = ['Northern','Southern','Eastern','Western','NorthEastern']
fig,axs = plt.subplots(5, 3,figsize=(30, 30),dpi = 250)
filepath = 'Regionwise'
for i in range(5):
data[data['Region']==region_name[i]].to_csv(filepath+'/'+region_name[i]+'.csv')
for j in range(3):
axs[i,j].plot(data[data['Region']==region_name[i]]['Date'],data[data['Region']==region_name[i]].iloc[:,3+2*j],label = "Actual",color='mediumseagreen',linewidth=0.9)
axs[i,j].plot(data[data['Region']==region_name[i]]['Date'],data[data['Region']==region_name[i]].iloc[:,4+2*j],label="Estimate",color='cornflowerblue',linewidth=0.9)
axs[i,j].set_title(region_name[i]+" Daily Power Generation - "+sectors[j])
axs[i,j].set_xlabel("Date")
axs[i,j].set_ylabel("Power Generated")
plt.show()
fig.savefig('Images/Regionwise Sectorwise Power Generation.png', transparent=True, dpi = 250)
Regionwise speaking,
We find the mean and standard deviations of monthwise power productin to understand two key features.
Mean Power Production gives us the nominal power production capacity per month.
Standard Deviations are helpful to studying the variance of the power production process and invites scientific research into regulating this performance.
Averages = data.groupby(['Year','Month'])[numeric].transform('mean')
STDs = data.groupby(['Year','Month'])[numeric].transform('std')
stat = data.groupby(['Year','Month'])[numeric].agg(['mean','std'])
stat.to_csv('Spreadsheets/Statistical Monthwise.csv')
stat
As we know $\mu$ and $\sigma$, we can find a $2\sigma$ band where likely power production can be expected.
The thickness of the band highlights the variability of the daily production capabilities.
fig = plt.figure(figsize=(12,8),dpi = 250)
T = [numeric[i] for i in [0,2,4]]
for i in T:
plt.plot(STDs[i]+Averages[i],linewidth = 0.5,label=i+' - Upper',linestyle='dotted')
plt.plot(Averages[i],linewidth = 2,label=i)
plt.plot(Averages[i]-STDs[i],linewidth = 0.5,label=i+' - Lower',linestyle='dotted')
plt.title("Two Sigma Confidence Interval Nationwide Power Generated per Month")
plt.legend()
plt.show()
fig.savefig('Images/Two Sigma Confidence Interval Nationwide Power Generated per Month.png', transparent=True, dpi = 250)
Analysis into our Industries' performance gives us not only insights into the industries themselves but also hints.
Hints to policy makers and consumers alike over the current state of the Industrial Infrastructure that we have learnt to treat as blackboxes only concerning themselves with the noticeable results.
As far as our Energy Industries are concerned,
https://www.kaggle.com/navinmundhra/daily-power-generation-in-india-20172020
https://npp.gov.in/dashBoard/cp-map-dashboard
https://powermin.gov.in/en/content/power-sector-glance-all-india